fix(quantization): emit axis on DequantizeLinear for per-channel dynamic quantization by Rishi-Dave · Pull Request #28228 · microsoft/onnxruntime

Rishi-Dave · 2026-04-25T11:20:30Z

Summary

Fix quantize_dynamic(per_channel=True) so weights quantized per-channel produce a DequantizeLinear node with the correct axis attribute.
Stop dropping the channel axis when quantize_weight_per_channel populates QuantizedValue (was hardcoded to None).
Gate the scalar-scale assertion in _dequantize_value on axis is None so per-channel scales (1-D tensors) are accepted.

Motivation

Fixes #19997.

When a model is quantized with quantize_dynamic(..., per_channel=True) and a per-channel weight reaches _dequantize_value (e.g. via _dequantize_outputs when the weight is in the graph outputs), two bugs surface:

quantize_weight_per_channel stores QuantizedValue.axis = None even though it received a real channel_axis, so the per-channel information is lost.
_dequantize_value (a) asserts scale_init.size == 1, which fails for a 1-D per-channel scale, and (b) builds the DequantizeLinear node without an axis attribute, producing an invalid ONNX node when the model is consumed.

PR #22283 (Nov 2024) softened the assertion against None-typed scales but left the underlying axis-propagation bug in place.

Changes

onnxruntime/python/tools/quantization/onnx_quantizer.py
- quantize_weight_per_channel: pass channel_axis (was None) into QuantizedValue.
- _dequantize_value: only require a scalar scale on the per-tensor path (axis is None); forward axis=quantized_value.axis to onnx.helper.make_node("DequantizeLinear", ...). make_node silently omits the attribute when axis is None, so the per-tensor path is unchanged.
onnxruntime/test/python/quantization/test_quant_issues.py
- New regression test test_dynamic_quantize_per_channel_emits_axis_attribute that builds a minimal MatMul model with the weight routed to a graph output (to force the _dequantize_outputs -> _dequantize_value path), runs quantize_dynamic(per_channel=True), and asserts the emitted DequantizeLinear has the axis attribute and a 1-D multi-element scale initializer.

Test Plan

python -m pytest onnxruntime/test/python/quantization/test_quant_issues.py -xvs — new test passes; existing test skipped as before.
python -m pytest onnxruntime/test/python/quantization/test_op_matmul.py — 7 passed, 8 skipped (no regression).
python -m pytest onnxruntime/test/python/quantization/test_qdq.py -k per_channel — 1 passed.
lintrunner -a on changed files: clean.

`quantize_weight_per_channel` was storing `None` as the axis in the `QuantizedValue` map entry instead of the actual `channel_axis` argument. As a result, `_dequantize_value` would hit an AssertionError (scale not scalar) when the per-channel-quantized weight was also a graph output, and even on success it would emit a `DequantizeLinear` node with no `axis` attribute, producing semantically incorrect per-tensor dequantization. Fix: - Pass `channel_axis` (not `None`) when constructing `QuantizedValue` in `quantize_weight_per_channel`. - Gate the scalar-scale assertion in `_dequantize_value` on `quantized_value.axis is None` (only required for per-tensor). - Forward `axis=quantized_value.axis` to `onnx.helper.make_node` for `DequantizeLinear`; `make_node` ignores `axis=None` automatically, so the per-tensor path is unaffected. Add regression test `test_dynamic_quantize_per_channel_emits_axis_attribute` that builds a minimal MatMul model with the weight also exposed as a graph output (so `_dequantize_outputs` fires on the per-channel weight), confirms quantization completes without error, and asserts the `axis` attribute is present on the resulting `DequantizeLinear` node with a multi-element scale. Fixes microsoft#19997

Copilot

Pull request overview

Fixes per-channel dynamic quantization so that per-channel weight quantization correctly propagates the channel axis into emitted DequantizeLinear nodes (and relaxes the scalar-scale assertion accordingly), addressing a failure mode reported in #19997.

Changes:

Preserve channel_axis when creating QuantizedValue for per-channel quantized weights.
Update _dequantize_value to (1) only enforce scalar scale for per-tensor quantization (axis is None) and (2) emit DequantizeLinear(axis=...) for per-channel cases.
Add a regression test ensuring quantize_dynamic(per_channel=True) emits a DequantizeLinear with an axis attribute and a 1-D per-channel scale initializer.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`onnxruntime/python/tools/quantization/onnx_quantizer.py`	Propagates per-channel axis into `QuantizedValue` and forwards it to `DequantizeLinear`; gates scalar-scale assertion to per-tensor path.
`onnxruntime/test/python/quantization/test_quant_issues.py`	Adds regression coverage that validates `axis` emission and per-channel (multi-element) scale initializer shape.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-25T16:48:55Z

+        # Build a model: input (5, 4) @ weight (4, 8) -> output (5, 8).
+        # The weight is also passed through Identity and exposed as a second graph
+        # output so that _dequantize_outputs calls _dequantize_value on the
+        # per-channel-quantized weight initializer.
+        # Weight axis=1 is the output-feature axis (per-channel quantization target).


The test docstring/comments suggest the _dequantize_outputs -> _dequantize_value path is exercised because the per-channel weight is a graph output, but this model outputs weight_out (Identity output), not the initializer weight. In practice the DequantizeLinear insertion here is likely triggered when the quantizer processes the unsupported Identity and dequantizes its (now-quantized) weight input. Updating the comment/docstring to match the actual mechanism would make the regression intent clearer and avoid confusion for future maintainers.

tianleiwu requested a review from Copilot April 25, 2026 16:42

Copilot started reviewing on behalf of tianleiwu April 25, 2026 16:43 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(quantization): emit axis on DequantizeLinear for per-channel dynamic quantization#28228

fix(quantization): emit axis on DequantizeLinear for per-channel dynamic quantization#28228
Rishi-Dave wants to merge 1 commit intomicrosoft:mainfrom
Rishi-Dave:rishidave/fix/dynamic-quant-per-channel-dequant-axis

Rishi-Dave commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Rishi-Dave commented Apr 25, 2026

Summary

Motivation

Changes

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants